Accurate extraction of road networks from high resolution satellite imagery is essential for various geospatial applications, including urban planning, autonomous navigation, disaster management, and geographic information system (GIS). However, automatic road segmentation remains challenging due to occlusions, spectral similarities with surrounding structures, and complex road geometries. This study proposes a deep learning based framework for road extraction using a transformer based semantic segmentation architecture. The model employs a SegFormer-B1 backbone to effectively capture both local spatial features and long range contextual information from satellite images. Experiments were conducted on the DeepGlobe Road Extraction dataset consisting of high resolution satellite imagery and corresponding road masks. The proposed approach achieved strong segmentation performance with a Mean Intersection over Union (IoU) of 0.7879, Dice score of 0.8632, and pixel accuracy of 97.8%. Qualitative results further demonstrate that the model successfully captures major road structures while maintaining good alignment with ground truth annotations. The results highlight the effectiveness of transformer based architectures for road extraction tasks and their potential for large scale geospatial mapping applications.
Introduction
The text describes a deep learning framework for accurate road extraction from high-resolution satellite imagery. Accurate road maps are critical for applications like autonomous navigation, urban planning, disaster response, and GIS updates. Despite advances in high-resolution imaging, automatic road extraction is challenging due to occlusions, spectral similarity with surrounding objects, complex road geometries, and the need to balance fine-grained detail with long-range network continuity.
Key points of the study:
Dataset: Uses the DeepGlobe Road Extraction dataset with 6,226 high-resolution RGB satellite images (1024×1024 pixels) and corresponding binary road masks. Data preprocessing includes binarization, normalization, one-hot encoding, and data augmentation (horizontal/vertical flipping).
Model Architecture: Employs SegFormer-B1, a transformer-based semantic segmentation model, combining a hierarchical transformer encoder with a lightweight decoder to capture both local details and global context. The output is adapted for binary road segmentation.
Training: Optimized with Dice loss using Adam optimizer, learning rate 8×10??, batch size 4, for 20 epochs. Pretrained weights from ADE20K enhance feature learning.
Evaluation: Metrics include Precision, Recall, F1-score, Mean IoU, Dice score, and Pixel Accuracy. The model achieved high performance (F1: 0.8632, Mean IoU: 0.7879, Pixel Accuracy: 97.8%), demonstrating accurate segmentation of road pixels while preserving network continuity.
Summary: The SegFormer-based framework effectively addresses the trade-off between local boundary precision and long-range structural continuity in road extraction, achieving robust and high-accuracy results across diverse urban and rural satellite imagery.
Conclusion
This study presented a deep learning framework for automatic road extraction from high resolution satellite imagery using a SegFormer based semantic segmentation model. The transformer based architecture effectively captures both local spatial features and global contextual information required for accurate road segmentation. Experiments on the DeepGlobe Road Extraction dataset demonstrate strong performance, achieving a Mean IoU score of 0.7879, Dice score of 0.8632, and pixel accuracy of 97.8%. These results indicate that the model can reliably identify road structures while maintaining consistent segmentation across complex satellite scenes. Qualitative analysis further shows that the model successfully delineates major road networks with good alignment to ground truth masks. Future work will focus on improving the detection of narrow and occluded roads by incorporating multi-scale feature fusion, boundary refinement techniques, and topology aware learning strategies to enhance segmentation accuracy and structural continuity.
References
[1] H. R. R. Bakhtiari, A. Abdollahi, and H. Rezaeian, “Semi automatic road extraction from digital images,” The Egyptian Journal of Remote Sensing and Space Sciences, vol. 20, no. 1, pp. 117–123, 2017, doi: 10.1016/j.ejrs.2017.03.001.
[2] Y. Wu, Q. Zhao, Z. Shen, and Y. Li, “Clustering point process based network topology structure constrained urban road extraction from remote sensing images,” IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, vol. 15, pp. 2087–2098, 2022, doi: 10.1109/JSTARS.2022.3151757.
[3] Y. Lv, Z. Yu, Z. Yin, and J. Qin, “Remote sensing image road segmentation based on conditions perceived 3D UX-Net,” Journal of the Indian Society of Remote Sensing, vol. 53, no. 12, pp. 4005–4015, 2025, doi: 10.1007/s12524-025-02158-3.
[4] Y. Wang, H. Huang, and B. Wu, “Evaluating the potential of SDGSAT-1 glimmer imagery for urban road detection,” IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, vol. 18, pp. 785–794, 2025, doi: 10.1109/JSTARS.2024.3502218.
[5] P. Kumar, A. S. Mathias, and A. N. Padmasali, “A YOLOv8 segmentation approach for detecting unstructured road boundaries in rural terrains using dash-camera,” IEEE Access, vol. 13, pp. 213429–213438, 2025, doi: 10.1109/ACCESS.2025.3645678.
[6] K. Zhang, A. As’arry, X. Shen, A. A. Hairuddin, M. K. Hassan, L. Zhu, and W. Qin, “DSWFNet: Dual-branch fusion of spatial and wavelet features for road extraction from remote sensing images,” Scientific Reports, vol. 16, no. 1, p. 3966, 2025, doi: 10.1038/s41598-025-34091-3.
[7] R. Lian, W. Wang, N. Mustafa, and L. Huang, “Road extraction methods in high-resolution remote sensing images: A comprehensive review,” IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, vol. 13, pp. 5489–5507, 2020, doi: 10.1109/JSTARS.2020.3023549.
[8] A. Abdollahi, B. Pradhan, N. Shukla, S. Chakraborty, and A. Alamri, “Deep learning approaches applied to remote sensing datasets for road extraction: A state-of-the-art review,” Remote Sensing, vol. 12, no. 9, p. 1444, 2020, doi: 10.3390/rs12091444.
[9] Z. Hong, D. Ming, K. Zhou, Y. Guo, and T. Lu, “Road extraction from a high spatial resolution remote sensing image based on richer convolutional features,” IEEE Access, vol. 6, pp. 46988–47000, 2018, doi: 10.1109/ACCESS.2018.2867210.
[10] X. Kuang, F. Cheng, C. Wu, H. Lei, and Z. Zhang, “BIR-Net: A lightweight and efficient bilateral interaction road extraction network,” IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, vol. 17, pp. 14194–14207, 2024, doi: 10.1109/JSTARS.2024.3439267.
[11] Q. Guo and Z. Wang, “A self-supervised learning framework for road centerline extraction from high-resolution remote sensing images,” IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, vol. 13, pp. 4451–4461, 2020, doi: 10.1109/JSTARS.2020.3014242.
[12] Y. Xu, Z. Xie, Y. Feng, and Z. Chen, “Road extraction from high-resolution remote sensing imagery using deep learning,” Remote Sensing, vol. 10, no. 9, p. 1461, 2018, doi: 10.3390/rs10091461.
[13] H. S. Munawar, A. W. A. Hammad, S. T. Waller, D. Shahzad, and M. R. Islam, “Road network detection from aerial imagery of urban areas using deep ResUNet in combination with the B-snake algorithm,” Human-Centric Intelligent Systems, vol. 3, no. 1, pp. 37–46, 2023, doi: 10.1007/s44230-023-00015-5.
[14] R. Liu, J. Wu, W. Lu, Q. Miao, H. Zhang, X. Liu, Z. Lu, and L. Li, “A review of deep learning-based methods for road extraction from high-resolution remote sensing images,” Remote Sensing, vol. 16, no. 12, p. 2056, 2024, doi: 10.3390/rs16122056.
[15] W. Wang, N. Yang, Y. Zhang, F. Wang, T. Cao, and P. Eklund, “A review of road extraction from remote sensing images,” Journal of Traffic and Transportation Engineering (English Edition), vol. 3, no. 3, pp. 271–282, 2016, doi: 10.1016/j.jtte.2016.05.005.
[16] Y. Zhang, N. Sun, L. Chen, L. Liu, and H. Zhu, “FAMNet: Lightweight road extraction network with fused attention and multilevel cascaded ASPP,” IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, vol. 18, pp. 25616–25629, 2025, doi: 10.1109/JSTARS.2025.3614661.
[17] Z. Luo, K. Zhou, Y. Tan, X. Wang, R. Zhu, and L. Zhang, “AD-RoadNet: An auxiliary-decoding road extraction network improving connectivity while preserving multiscale road details,” IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, vol. 16, pp. 8049–8062, 2023, doi: 10.1109/JSTARS.2023.3289583.
[18] S. Qu, G. Liu, X. Zhang, and Y. Liu, “Heterogeneous dual-decoder network for road extraction in remote sensing images,” Scientific Reports, vol. 15, no. 1, p. 31619, 2025, doi: 10.1038/s41598-025-17445-9.
[19] S. Mo, Y. Shi, Q. Yuan, and M. Li, “A survey of deep learning road extraction algorithms using high-resolution remote sensing images,” Sensors, vol. 24, no. 5, p. 1708, 2024, doi: 10.3390/s24051708.
[20] J. Ren, L. Liu, Z. Xia, and Y. Liu, “AED-Net: A high-resolution remote sensing image road extraction method integrating atrous spatial pyramid pooling and efficient channel attention mechanism,” IEEE Access, vol. 13, pp. 61067–61077, 2025, doi: 10.1109/ACCESS.2025.3548262.
[21] H. Bai, C. Ren, Z. Huang, and Y. Gu, “A dynamic attention mechanism for road extraction from high-resolution remote sensing imagery using feature fusion,” Scientific Reports, vol. 15, no. 1, p. 17556, 2025, doi: 10.1038/s41598-025-02267-6.
[22] L. Guo, X. Bai, H. Huo, Z. Wu, W. Zhang, and C. Wang, “High-resolution road information extraction based on an improved dual-path neural network,” IEEE Geoscience and Remote Sensing Letters, vol. 21, pp. 1–5, 2024, doi: 10.1109/LGRS.2024.3453429.